A Stata Plugin for Estimating Group-Based Trajectory Models
نویسندگان
چکیده
Group-based trajectory models are used to investigate population differences in the developmental courses of behaviors or outcomes . This article demonstrates a new Stata command, traj, for fitting to longitudinal data finite (discrete) mixture models designed to identify clusters of individuals following similar progressions of some behavior or outcome over age or time. Censored normal, Poisson, zero-inflated Poisson, and Bernoulli distributions are supported. Applications to psychometric scale data, count data, and a dichotomous prevalence measure are illustrated. Introduction A developmental trajectory measures the course of an outcome over age or time. The study of developmental trajectories is a central theme of developmental and abnormal psychology and psychiatry, of life course studies in sociology and criminology, of physical and biological outcomes in medicine and gerontology. A wide variety of statistical methods are used to study these phenomena. This article demonstrates a Stata plugin for estimating group-based trajectory models. The Stata program we demonstrate adapts a well-established SAS-based procedure for estimating group-based trajectory model (Jones, Nagin, and Roeder, 2001; Jones and Nagin, 2007) to the Stata platform. Group-based trajectory modeling is a specialized form of finite mixture modeling. The method is designed identify groups of individuals following similarly developmental trajectories. For a recent review of applications of group-based trajectory modeling see Nagin and Odgers (2010) and for an extended discussion of the method, including technical details, see Nagin (2005). A Brief Overview of Group-Based Trajectory Modeling Using finite mixtures of suitably defined probability distributions, the group-based approach for modeling developmental trajectories is intended to provide a flexible and easily applied method for identifying distinctive clusters of individual trajectories within the population and for profiling the characteristics of individuals within the clusters. Thus, whereas the hierarchical and latent curve methodologies model population variability in growth with multivariate continuous distribution functions, the group-based approach utilizes a multinomial modeling strategy. Technically, the group-based trajectory model is an example of a finite mixture model. Maximum likelihood is used for the estimation of the model parameters. The maximization is performed using a general quasi-Newton procedure (Dennis, Gay, and Welsch 1981; Dennis and Mei 1979). The fundamental concept of interest is the distribution of outcomes conditional on age (or time); that is, the distribution of outcome trajectories denoted by ), | ( i i Age Y P where the random vector Yi represents individual i’s longitudinal sequence of behavioral outcomes and the vector Agei represents individual i’s age when each of those measurements is recorded. The group-based trajectory model assumes that the population distribution of trajectories arises from a finite mixture of unknown order J. The likelihood for each individual i, conditional on the number of groups J, may be written as 1 Trajectories can also be defined by time (e.g., time from treatment). 1 ( | ) ( | , ; ) (1), J j j i i i i j P Y Age P Y Age j where is the probability of membership in group j, and the conditional distribution of Yi given membership in j is indexed by the unknown parameter vector which among other things determines the shape of the group-specific trajectory. The trajectory is modeled with up to a 5 order polynomial function of age (or time). For given j, conditional independence is assumed for the sequential realizations of the elements of Yi , yit, over the T periods of measurement. Thus, we may write T i t j it it j i i j age y p j Age Y P ), 2 ( ) ; , | ( ) ; , | ( where p(.) is the distribution of yit conditional on membership in group j and the age of individual i at time t. 2 The software provides three alternative specifications of p(.): the censored normal distribution also known as the Tobit model, the zero-inflated Poisson distribution, and the binary logit distribution. The censored normal distribution is designed for the analysis of repeatedly measured, (approximately) continuous scales which may be censored by either a scale minimum or maximum or both (e.g., longitudinal data on a scale of depression symptoms). A special case is a scale or other outcome variable with no minimum or maximum. The zero-inflated Poisson distribution is designed for the analysis of longitudinal count data (e.g., arrests by age) and binary logit distribution for the analysis of longitudinal data on a dichotomous outcome variable (e.g., whether hospitalized in year t or not). The model also provides capacity for analyzing the effect of time-stable covariate effects on probability of group membership and the effect of time dependent covariates on the trajectory itself. Let i x denote a vector of time stable covariates thought to be associated with probability of trajectory group membership. Effects of time-stable covariates are modeled with a generalized logit function where without loss of generality :
منابع مشابه
A Note on a Stata Plugin for Estimating Group-based Trajectory Models
Group-based trajectory models areused to investigate population differences in the developmental courses of behaviors or outcomes. This note introduces a new Stata command, traj, for fitting to longitudinal data finite (discrete) mixture models designed to identify clusters of individuals following similar progressions of some behavior or outcome over age or time. Normal, Censored normal, Poiss...
متن کاملBoosted Regression (Boosting): An introductory tutorial and a Stata plugin
Boosting, or boosted regression, is a recent data mining technique that has shown considerable success in predictive accuracy. This article gives an overview over boosting and introduces a new Stata command, boost, that implements the boosting algorithm described in Hastie et al. (2001, p. 322). The plugin is illustrated with a Gaussian and a logistic regression example. In the Gaussian regress...
متن کاملSemi-parametric difference-based estimation of partial linear regression models
This article describes the plreg Stata command, which implements the difference-based algorithm for estimating the partial linear regression models.
متن کاملLane Change Trajectory Model Considering the Driver Effects Based on MANFIS
The lane change maneuver is among the most popular driving behaviors. It is also the basic element of important maneuvers like overtaking maneuver. Therefore, it is chosen as the focus of this study and novel multi-input multi-output adaptive neuro-fuzzy inference system models (MANFIS) are proposed for this behavior. These models are able to simulate and predict the future behavior of a Dri...
متن کاملAdvances in Group-based Trajectory Modeling and a SAS Procedure for Estimating Them
This article is a follow-up to Jones, Nagin, and Roeder (2001), which described a SAS procedure for estimating group-based trajectory models. Group-based trajectory is a specialized application of finite mixture modeling and is designed to identify clusters of individuals following similar progressions of some behavior or outcome over age or time. This article has two purposes. One is to summar...
متن کامل